Glass transition in secondary structures formed by random RNA sequences 
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Formation of RNA secondary structures is an example of the sequence-structure problem om- 
nipresent in biopolymers. A theoretical question of recent interest is whether a random RNA 
sequence undergoes a finite temperature glass transition. We answer this question affirmatively by 
first establishing the perturbative stability of the high temperature phase via a two replica calcula- 
tion. Subsequently, we show that this phase cannot persist down to zero temperature by considering 
energetic contributions due to rare regions of complementary subsequences. 
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Introduction: RNA is an important biopolymer critical to 
all living systems Like in DNA, there are four types 
of nucleotides (or bases) A, C, G, and U which, when 
polymerized can form double helical structures consist- 
ing of stacks of stable Watson-Crick (A-U or G-C) pairs. 
However unlike a long polymer of DNA, which is often 
accompanied by a complementary strand and forms oth- 
erwise featureless double helical structures, a polymer of 
RNA is usually single-stranded. It bends onto itself and 
forms elaborate spatial structures in order for bases lo- 
cated on different parts of the backbone to pair with each 
other. The structures encoded by the primary sequences 
often have important biological functions, much like how 
the primary sequence of a protein encodes its structure. 

Understanding the encoding of structure from the pri- 
mary sequence has been an outstanding problem of theo- 
retical biophysics. Most work in the past decade has been 
focused on the problem of protein folding, which is very 
difficult analytically and numerically j^]. Here, we study 
the problem of RNA folding, specifically the formation 
of RNA secondary structures which is more amenable to 
analytical and numerical approaches due to a separation 
of energy scales [§. Efficient algorithms j|, |^ together 
with carefully measured free energy parameters de- 
scribing the formation of various microscopic structures 
(e.g., stacks, loops, hairpins, etc.) allow the exact calcu- 
lation of the ensemble of secondary structures formed by 
a given RNA molecule of up to a few thousand bases. 

In this work, we are not concerned with the structure 
formed by a specific sequence. Instead, we will study the 
statistics of secondary structures formed by the ensemble 
of long random RNA sequences. It has been debated re- 
cently whether such an ensemble undergoes a finite tem- 
perature glass transition |^. However, the numerical 
results these studies are based on are not clear enough 
to allow unambiguous interpretation. In this letter, we 
provide analytical evidence supporting the existence of a 
finite temperature glass transition by studying some toy 
models of RNA folding. We characterize the behavior of 
RNA in the absence of disorder and show with the help of 
a two-replica calculation that disorder is perturbatively 



irrelevant. We then show that the assumption that the 
high-temperature phase exists at all temperatures leads 
to a contradiction below some finite temperature, thereby 
necessitating a finite temperature phase transition. 

Model: A secondary structure S' of a polymer of RNA 
is the set of pairings formed between all of its monomers 
(or bases), with each base allowed in at most one pairing. 
We denote the pairing between the i^^ and j^^ monomer 
by with l<i<j<N. Each such structure can be 

represented by a diagram like the one shown in Fig. 0(a). 
In addition, it is common to exclude "pseudo-knots" like 
the one shown in Fig. |^(b) from the definition of sec- 
ondary structures, so that any two base pairs (i,j) and 
{k,l) are either independent, i.e., i<j<k<l, or nested, 
i.e., i < k <l < j . This is permissible since long pseudo- 
knots are kinetically difficult to form and even the short 
ones occur infrequently due to energetic reasons Q . Ex- 
perimentally, it is possible to "turn off" the pseudo-knots 
and other complicated tertiary contacts |^ and study ex- 
clusively the class of secondary structures defined above. 




FIG. 1: Secondary structures of an RNA molecule: The solid 
and dashed lines represent the backbone and the base pairs 
respectively, (a) shows a valid secondary structure while (b) 
contains a pseudo-knot as indicated by the arrow. 

In order to calculate Boltzmann factors for an ensem- 
ble of secondary structures, we need to assign an energy 
E[S] to each structure S. For the purpose of secondary 
structure prediction, it is essential to model the energy 
as accurately as possible |Q, ||. However, for our inter- 
est in the universal statistical properties of long, random 
RNA sequences far below the denaturation temperature, 
it suffices to consider simplified models along the line of 
those used in earlier studies |0, ^, ^, |l^ . 

We associate an interaction energy Sij with every pair- 



ing and assign E[S] = J2 



Ei 1 as the total en- 
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ergy of the structure S. To retain the spirit of Watson- 
Crick pairing, we choose random sequences bi . . .Bn of 
the four bases A, U, C, and G and then assign 

_ f —Urn {bi,bj) is a Watson-Crick base pair 
\ Umm otherwise 

with Um,Urnm > 0. Here, Um sets the energy scale. The 
value of Umm is not essential as long as it is repulsive, 
since there is always the option to not bind at all (with 
energy "0" ) rather than to bind with a repulsive energy. 

For analytical work, it is convenient to take the Sij to 
be independent Gaussian random variables specified by 
the mean s and variance 

{sij - e){ek,i -e) = D 5i^k5j,i- (2) 

Throughout the text, we will use the overline to denote 
averages over the ensemble of random pairing energies. 
Here, the parameter D provides a measure of the strength 
of the randomness. It is an approximation to the model 
(|l|) in two respects: First, it replaces the discrete distribu- 
tion of energies by a Gaussian distribution. Moreover, it 
neglects the correlations between Si^j and ei^k induced by 
the shared base bi. We do not anticipate universal quan- 
tities to depend on the subtle differences in the statistics 
of the Eij's. This will be tested numerically by compar- 
ing the scaling behavior produced by the two models. 

Given the energy of each secondary structure, we can 
study the partition function Z{N) — Y1,s&s(n) e~^^^^^ 
where S{N) comprises all valid secondary structures of a 
molecule of length N . This function can be conveniently 
computed in terms of the resiricied partition function Zij 
for the substrand bi . . .bj. For the simple energy model 
E[S] described above, Zi,j can be split up according to 
the possible pairings of position j to yield the exact re- 
cursion relation §, |ll|, |l| 

= + J2 Zr,k-i ■ e-l^'"-^ ■ Zk+i.j-i. (3) 

k—i 

This equation can then be iterated to compute the full 
partition function Z{N) — Zi^n for arbitrary interaction 
energies Sij's in 0{N'^) time |l^. It also forms the 
basis of analytical approaches to the problem. 

The molten phase: If sequence disorder does not play an 
important role, we may describe the RNA molecule by 
replacing all the binding energies Sij by some effective 
value £o- As we will see later, this is an adequate descrip- 
tion of random RNA at high enough temperatures (but 
below denaturation.) Below we briefly review the prop- 
erties of RNA in this high temperature "molten phase" . 

Since the -^ij 's become translationally invariant in the 
absence of disorder, it is straightforward to solve Eq. (||) 
for the molten phase partition function Zq{N) using the 
z-transform. For large N, it has the form |ll], |l^ 

Zo(iV) N-'« exp[-/37V/o(g)] (4) 



where q = e~^^° is the only parameter, = 3/2 is a 
universal exponent, and /o((7) = — fcBTln(l-|-2y^) is the 
free energy per length. 

One useful observable characterizing the state of the 
RNA is the free energy cost AF{N) of pinching together 
one end (say i — I) and the mid-point {i — N/2) of the 
polymer relative to the unperturbed state. The pinch 
effectively separates the polymer into two pieces of length 
In the molten phase, we simply have 

AF = ^kBT\n[Zo{N)/Z^iN/2)] « ffcsTlniV. (5) 

It reflects the loss of configurational entropy in the al- 
lowed secondary structures due to the pinch. 

Numerics: Before we delve into the analysis, we first 
present some numerical evidence for a phase transition 
between the high- and low- temperature behavior for the 
ensemble of random RNA. To this end, we generate many 
configurations of interaction energies £i,j's according to 
both models (|l|) and (^), with u„ = u„„ = l and e = 1/2, 
£' = 3/4 respectively. Then for a wide range of tempera- 
tures we calculate the pinching free energy, which is given 
by AF = -kBT\n{e-f^'^'«/^Z2M/2-iZN/2+i,N/Zi,N) in 
terms of the quantity Zij in (0) for sequences without 
translational invariance. The result is then averaged over 
1000 to 10000 disorder configurations and illustrated in 
Fig. 1^ for two representative temperatures. 

At high temperature {ksT — 2), the pinch energy fol- 
lows precisely the molten phase behavior expected ac- 
cording to Eq. (^ up to an additive constant. Thus, 
at high temperatures the molten phase description is 
applicable even if the interaction energies e^j are not 
uniform. Moreover, we find no difference between the 
two models of disorder at ksT = 2. At low tempera- 
ture (fcfiT = 0.025), the picture is different. The length 
dependence of AF is still logarithmic. However, the pref- 
actors differ between the two disorder choices and they 
are both by far larger than the prefactor 0.0375 expected 
if the molten phase result is extrapolated to this temper- 
ature. This suggests that there is a distinct low temper- 
ature phase; this finding is reinforced by similar changes 
detected in other observables [|l3j. 

High temperature behavior: Now we will establish the sta- 
bility of the molten phase against weak disorder by a per- 
turbative analysis. Assuming that the specific choice of 
disorder does not matter at weak disorder (or high tem- 
perature) as supported by the numerical results above, 
we will use the uncorrelated Gaussian disorder charac- 
terized by Eq. (||) for the purpose of this analysis. The 
behavior of the system at weak disorder is determined 
by the lowest order terms in the perturbative expan- 
sion of the ensemble averaged free energy in the disorder 
strength D. This term is given by the two-replica parti- 
tion function Z'^{N) of two RNA molecules sharing the 
same disorder. With the uncorrelated Gaussian energies 
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FIG. 2: Ensemble averaged pinching free energy AF{N) for 
two different dioices of disorder energies according to Eqs. (0) 
and (|), at ksT = 2 and ksT = 0.025. The statistical error 
is no larger than the size of the symbols. At ksT = 2, the 
data follows precisely the expected 3 In A'' behavior (dashed 
line.) At ksT = 0.025, the best fits for the data at A > 160 
to a logarithmic behavior (dotted lines) give AF ~ 0.90 In A 
for sequence and AF ~ 1.18 In A for Gaussian disorder. 

the ensemble average can be explicitly performed, 
yielding after some algebra 



{Si,S2G5(iV)} 



where q = exp (— /3e + \ fi^D) and g = exp [0^D) are the 
two relevant "Boltzmann factors", and \Si\ and \Si 5*^1 
are the number of bases contained in structure Si or com- 
mon to Si and 5*^ respectively. This effective partition 
function has a simple physical interpretation: It describes 
two RNA molecules subject to a homogeneous attraction 
with effective interaction energy eo = — h\q = £—^l3D 
between any two bases of the same molecule. In addition, 
there is an inter-replica attraction characterized by the 
factor q for each bond shared between the two replicas. 
The inter-replica attraction is induced by the same dis- 
order shared by the replicas. It can potentially force the 
replicas to "lock" together, i.e., to become correlated. 

It turns out that this two-replica problem can be 
solved exactly [^S). The key idea leading to the solu- 
tion is that the sum over the pairs of secondary struc- 
tures in Eq. can be reordered by first summing 
over the bonds which are common to the two struc- 
tures, noting that the possible configurations of the com- 
mon bonds are themselves the set S{N) of valid sec- 
ondary structures. Within a given configuration of com- 
mon bonds, all possibilities to place non-common bonds 
in the two individual structures can then be summed 
over, leading to an effective single RNA problem. How- 
ever, the necessary algebra is quite involved here 
we just quote the results. The solution has the form 
Z^{N) - Ar-^exp[-/3Af/(g,q)], with two different ex- 
pressions for 9 and / depending on whether q is above 
or below a critical value qc = 1 + l/i?^ Sn'=i ^5^(^)1: 
where g{N) = Zo(iV)/(l-h2^)^-i- Af-3/2 for large A^. 

For q< qc, we have = 29q = 3. In addition, f{q,q) 



is basically a modified version of 2fo{q). In this regime, 
the two-replica partition function Z"^ (N) is essentially a 
product of two single-replica partition functions Zo{N). 
Since there is no coupling between the two replicas, we 
conclude that the effect of disorder is irrelevant in this 
regime. For q > qc, we have 9 = 3/2 and / given by a 
complicated function of Zq{N) Here, the two-replica 
partition function has the same asymptotic form as that 
of the single-replica system in (^. This implies that the 
disorder coupling locks the two replicas together. 

Of course, as already explained above, only the weak- 
disorder limit (i.e., q^l) of the two-rephca problem can 
be applied to the full random RNA problem. While qc 
itself depends on the disorder strength P^D, it converges 
in the weak disorder limit {(3^D <C 1) towards a constant 
qc{P^D = 0)>l. Therefore, for ^ 1 we always have 
q < qc- The molten phase is perturbatively stable upon 
the introduction of disorder, making it the appropriate 
description of random RNA at high temperatures. 

Low temperature behavior: Next we determine whether 
the molten phase persists for all temperatures down to 
T = 0+. In the following, we will assume that long ran- 
dom RNA is in the molten phase for all temperatures, 
i.e., that the partition function for any substrand of large 
length is given by Eq. (^, with some effective value of q. 
Then, we will show that this assumption leads to a con- 
tradiction below some temperature T* > 0. This contra- 
diction implies that the molten phase description breaks 
down at some finite Tc > T*. To be specific, we will 
consider the sequence disorder model (|l]) in this analysis. 

We will again focus on the pinching free energy A_F. 
Under the assumption that the random sequences are 
described by the molten phase, it is given by Eq. (^ for 
large N and all T independently of the effective value 
of q. On the other hand, we can study this pinching 
free energy for each given sequence of bases drawn from 
the ensemble of random sequences. For each such se- 
quence, we can look for a continuous segment of i <^ N 
Watson-Crick pairs {hi, bj){b,+i,bj_i) . . . (6i+f_i, 6j_£+i) 
where the bases bi . . . are within the first half of 

the molecule and the bases bj-i+i ■ ■ - bj are in the second 
half. For random sequences, the probability of finding 
such complementary segments decreases exponentially 
with the length ^ in a given region, with the largest £ 
in a sequence of length A^ being typically In N/ In 2 . 

Now we calculate the pinching free energy AF = 
-Fpinchcd - -Funpinchcd by evaluating the two terms sep- 
arately. The partition function corresponding to the un- 
pinched free energy contains at least all the configurations 
in which the two complementary segments hi . . 
and ■ ■ - bj are completely paired. Thus, 



unpinchcd 



paired 



(7) 



where impaired is the free energy of the ensemble of struc- 
tures in which the two complementary segments are 
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paired. The latter is the sum of the energy of the paired 
segments and those of the two remaining substrands of 
lengths Li=j — i — 2£+l and L2=N + i—j — l, i.e., 

-Fpaircd=-^Urn+ (N -2£) fo + fAlfiT [ln(Li )+ln(L2)] . (8) 

The free energy -Fpinchcd is, by the assumption of the 
molten phase, the interaction energy of the pinched base 
pair (61, plus the free energy of the two substrands 

^2 ■ • ■ and 6jv/2+i • • ■ ^at- According to Eq. (||), 

this is i^pinchcd — foN + 2 X ^ksTlnN up to terms in- 
dependent of N. Combining this with Eqs. (|^) and (||), 
and noting that £ is typically of the order In N/ In 2 and 
Li, L2 are typically proportional to N, wc finally obtain 
AF > [u,„ + 2/0] In TV/ In 2 for very large N. This is only 
consistent with Eq. (^) if 

|fcBr> K„ + 2/o]/ln2. (9) 

Since /o is a free energy per length, the right hand 
side of this equation is a decreasing function of temper- 
ature. Moreover, /o(T = 0) is the average free energy 
per length of the minimum energy secondary structures 
of random sequences. The lowest possible energy any 
structure of an RNA molecule of N bases can achieve is 
— ^-Um, i.e., if all bases form Watson-Crick pairs. How- 
ever, if the sequence disorder leads to frustration, there 
is always a finite fraction of bases which cannot be in- 
corporated into Watson-Crick base pairs and thus 
/o(T = 0)>-u,„/2. Therefore, the right hand side of (|) 
is a decreasing function of T starting at some positive 
value at T = 0. It follows that there is some unique tem- 
perature T* below which the consistency condition (j^) 
breaks down, implying the inconsistency of the molten 
phase assumption in this regime. From this we conclude 
that there must be a phase transition away from the 
molten phase at some critical temperature Tc>T*> 0. 
Numerically, we find that /o(T = 0) w — 0.46 for Um = 1, 
yielding fc^T* « 0.075. This is consistent with the nu- 
merically observed change between the low- and high- 
temperature behaviors at ksTc ~ 0.25 ||l3|. Improved 
estimates of Tc can be made by relaxing the condition 
of perfect complementarity of the two segments; this as 
well as a detailed characterization of the low temperature 
phase will be discussed elsewhere. 

Conclusions: We studied the behavior of random RNA 
sequences in the regime below the denaturation transi- 
tion. A two-replica calculation shows that disorder is per- 
turbatively irrelevant, i.e., an RNA molecule with weak 
sequence disorder is in the molten phase where many 
secondary structures with comparable total energy coex- 
ist. By further considering the rare regions of strong se- 
quence complementarity, we show that the molten phase 



cannot exist down to arbitrarily low temperatures, and 
thereby establish the existence of a distinct low tempera- 
ture phase below some critical temperature Tc > 0. Our 
analysis follows the approach used in Ref. ||l5| to show 
the relevance of disorder in the denaturation of double- 
stranded DNA. It will be interesting to see whether a 
renormalization group theory along the line of Ref. 
may be constructed to elucidate the low-temperature 
phase of the random RNA problem. 



Acknowledgments: The authors benefited from discus- 
sions with U. Gerland, D. Moroz, and especially L.-H. 
Tang regarding the role of rare regions. This work is 
supported by the NSF through grants no. DMR-9971456 
and DBI-9970199. 



[1] see, e.g., R.F. Gesteland and J.F. Atkins eds., The RNA 

world : the nature of modern RNA suggests a prebiotic 

RNA world (Cold Spring Harbor Laboratory Press, Cold 

Spring Harbor, 1993). 
[2] K.A. Dill et al, Protein Set. 4, 561 (1995); J.N. Onuchic, 

Z. Luthey-Schulten, and P.G. Wolynes, Annu. Rev. Phys. 

Chem. 48, 545 (1997); T. Garel, H. Orland, and E. 

Pitard, J. Phys. I 7, 1201 (1997); E.I. Shakhnovich, 

Curr. Op. Struct. Biol. 7, 29 (1997). 
[3] I. Tinoco Jr and C. Bustamante, J. Mol. Biol. 293 271 

(1999) and references therein. 
[4] J.S. McCaskill, Biopolymers 29, 1105 (1990). 
[5] M. Zuker and P. Stiegler, Nucl. Acid. Res. 9, 133 (1981); 

I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, 

M. Tacker, and P. Schuster, Monatshefte f. Chemie 125, 

167 (1994). 

[6] S.M. Freier, R. Kierzek, J.A. Jaeger, N. Sugimoto, M.H. 

Caruthers, T. Neilson, and D.H. Turner, Proc. Natl. 

Acad. Set. USA 83, 9373 (1986). 
[7] P.G. Higgs, Phys. Rev. Lett. 76, 704 (1996). 
[8] A. Pagnani, G. Parisi, and E. Ricci-Tersenghi, Phys. Rev. 

Lett. 84, 2026 (2000), Phys. Rev. Lett. 86, 1383 (2001); 

A.K. Hartmann, ibid. 86, 1382 (2001). 
[9] R. Bundschuh and T. Hwa, Phys. Rev. Lett. 83, 1479 

(1999). 

[10] P.G. Higgs, QuaH. Rev. Biophys. 33, 199 (2000). 

[11] P.G. de Gennes, Biopolymers 6, 715 (1968). 

[12] M.S. Waterman, Advan. in Math. Suppl. Studies 1, 167, 

Academic Press, New York, 1978. 
[13] R. Bundschuh and T. Hwa, in preparation. 
[14] R. Arratia, L. Gordon, and M. Waterman, Ann. Stat. 18, 

539 (1990). 

[15] L.-H. Tang and H. Chate, Phys. Rev. Lett. 38, 830 (2001). 

[16] This is not the case for the sequence disorder model at 
an alphabet size 2 unless a minimum hairpin length con- 
straint is introduced as in Ref. H. 



